We will normally meet Tuesdays. Please note that we will be meeting on the following Thursdays (not Tuesdays) because of my travel schedule:
today
January 30
February 6
February 13
March 20
Final exam
The exam will be remote on April 30 from 9am-12pm.
Questions?
Introduction
Objective
This is the first in a two-course sequence designed to help you become competent quantitative researchers in sociology.
This includes learning proper decision making, explanation, computation, visualization, and interpretation.
Data and Variables
Data structure
country
continent
year
lifeExp
pop
gdpPercap
Afghanistan
Asia
1952
28.801
8425333
779.4453
Afghanistan
Asia
1957
30.332
9240934
820.8530
Afghanistan
Asia
1962
31.997
10267083
853.1007
Afghanistan
Asia
1967
34.020
11537966
836.1971
Afghanistan
Asia
1972
36.088
13079460
739.9811
Afghanistan
Asia
1977
38.438
14880372
786.1134
Afghanistan
Asia
1982
39.854
12881816
978.0114
Afghanistan
Asia
1987
40.822
13867957
852.3959
Afghanistan
Asia
1992
41.674
16317921
649.3414
Afghanistan
Asia
1997
41.763
22227415
635.3414
Tidy format: columns contain variables, each row is an observation.
Untidy data
country
continent
lifeExp_1952
lifeExp_1957
lifeExp_1962
lifeExp_1967
lifeExp_1972
lifeExp_1977
lifeExp_1982
lifeExp_1987
lifeExp_1992
lifeExp_1997
lifeExp_2002
lifeExp_2007
pop_1952
pop_1957
pop_1962
pop_1967
pop_1972
pop_1977
pop_1982
pop_1987
pop_1992
pop_1997
pop_2002
pop_2007
gdpPercap_1952
gdpPercap_1957
gdpPercap_1962
gdpPercap_1967
gdpPercap_1972
gdpPercap_1977
gdpPercap_1982
gdpPercap_1987
gdpPercap_1992
gdpPercap_1997
gdpPercap_2002
gdpPercap_2007
Afghanistan
Asia
28.801
30.332
31.997
34.020
36.088
38.438
39.854
40.822
41.674
41.763
42.129
43.828
8425333
9240934
10267083
11537966
13079460
14880372
12881816
13867957
16317921
22227415
25268405
31889923
779.4453
820.8530
853.1007
836.1971
739.9811
786.1134
978.0114
852.3959
649.3414
635.3414
726.7341
974.5803
Albania
Europe
55.230
59.280
64.820
66.220
67.690
68.930
70.420
72.000
71.581
72.950
75.651
76.423
1282697
1476505
1728137
1984060
2263554
2509048
2780097
3075321
3326498
3428038
3508512
3600523
1601.0561
1942.2842
2312.8890
2760.1969
3313.4222
3533.0039
3630.8807
3738.9327
2497.4379
3193.0546
4604.2117
5937.0295
Algeria
Africa
43.077
45.685
48.303
51.407
54.518
58.014
61.368
65.799
67.744
69.152
70.994
72.301
9279525
10270856
11000948
12760499
14760787
17152804
20033753
23254956
26298373
29072015
31287142
33333216
2449.0082
3013.9760
2550.8169
3246.9918
4182.6638
4910.4168
5745.1602
5681.3585
5023.2166
4797.2951
5288.0404
6223.3675
Angola
Africa
30.015
31.999
34.000
35.985
37.928
39.483
39.942
39.906
40.647
40.963
41.003
42.731
4232095
4561361
4826015
5247469
5894858
6162675
7016384
7874230
8735988
9875024
10866106
12420476
3520.6103
3827.9405
4269.2767
5522.7764
5473.2880
3008.6474
2756.9537
2430.2083
2627.8457
2277.1409
2773.2873
4797.2313
Argentina
Americas
62.485
64.399
65.142
65.634
67.065
68.481
69.942
70.774
71.868
73.275
74.340
75.320
17876956
19610538
21283783
22934225
24779799
26983828
29341374
31620918
33958947
36203463
38331121
40301927
5911.3151
6856.8562
7133.1660
8052.9530
9443.0385
10079.0267
8997.8974
9139.6714
9308.4187
10967.2820
8797.6407
12779.3796
Australia
Oceania
69.120
70.330
70.930
71.100
71.930
73.490
74.740
76.320
77.560
78.830
80.370
81.235
8691212
9712569
10794968
11872264
13177000
14074100
15184200
16257249
17481977
18565243
19546792
20434176
10039.5956
10949.6496
12217.2269
14526.1246
16788.6295
18334.1975
19477.0093
21888.8890
23424.7668
26997.9366
30687.7547
34435.3674
Austria
Europe
66.800
67.480
69.540
70.140
70.630
72.170
73.180
74.940
76.040
77.510
78.980
79.829
6927772
6965860
7129864
7376998
7544201
7568430
7574613
7578903
7914969
8069876
8148312
8199783
6137.0765
8842.5980
10750.7211
12834.6024
16661.6256
19749.4223
21597.0836
23687.8261
27042.0187
29095.9207
32417.6077
36126.4927
Bahrain
Asia
50.939
53.832
56.923
59.923
63.300
65.593
69.052
70.750
72.601
73.925
74.795
75.635
120447
138655
171863
202182
230800
297410
377967
454612
529491
598561
656397
708573
9867.0848
11635.7995
12753.2751
14804.6727
18268.6584
19340.1020
19211.1473
18524.0241
19035.5792
20292.0168
23403.5593
29796.0483
Bangladesh
Asia
37.484
39.348
41.216
43.453
45.252
46.923
50.009
52.819
56.018
59.412
62.013
64.062
46886859
51365468
56839289
62821884
70759295
80428306
93074406
103764241
113704579
123315288
135656790
150448339
684.2442
661.6375
686.3416
721.1861
630.2336
659.8772
676.9819
751.9794
837.8102
972.7700
1136.3904
1391.2538
Belgium
Europe
68.000
69.240
70.250
70.940
71.440
72.800
73.930
75.350
76.460
77.530
78.320
79.441
8730405
8989111
9218400
9556500
9709100
9821800
9856303
9870200
10045622
10199787
10311970
10392226
8343.1051
9714.9606
10991.2068
13149.0412
16672.1436
19117.9745
20979.8459
22525.5631
25575.5707
27561.1966
30485.8838
33692.6051
Benin
Africa
38.223
40.358
42.618
44.885
47.014
49.190
50.904
52.337
53.919
54.777
54.406
56.728
1738315
1925173
2151895
2427334
2761407
3168267
3641603
4243788
4981671
6066080
7026113
8078314
1062.7522
959.6011
949.4991
1035.8314
1085.7969
1029.1613
1277.8976
1225.8560
1191.2077
1232.9753
1372.8779
1441.2849
Bolivia
Americas
40.414
41.890
43.428
45.032
46.714
50.023
53.859
57.251
59.957
62.050
63.883
65.554
2883315
3211738
3593918
4040665
4565872
5079716
5642224
6156369
6893451
7693188
8445134
9119152
2677.3263
2127.6863
2180.9725
2586.8861
2980.3313
3548.0978
3156.5105
2753.6915
2961.6997
3326.1432
3413.2627
3822.1371
Types of variables
Ratio
dollars; points (e.g., basketball)
Interval
degrees Celsius
Ordinal
clothing sizes; Likert scales
Nominal
race; sex; country
The first two types are continuous or numeric. The second two types are categorical. Ordinal variables are often treated as numeric and this is usually fine.
Let’s investigate this using the gapminder data. First of all, we’ll keep only the most recent (2007) data.
d <- gapminder |>filter(year ==max(year)) |># keep 2007select(-year) # don't need column
country
continent
lifeExp
pop
gdpPercap
Afghanistan
Asia
43.828
31889923
974.5803
Albania
Europe
76.423
3600523
5937.0295
Algeria
Africa
72.301
33333216
6223.3675
Angola
Africa
42.731
12420476
4797.2313
Argentina
Americas
75.320
40301927
12779.3796
Australia
Oceania
81.235
20434176
34435.3674
Austria
Europe
79.829
8199783
36126.4927
Bahrain
Asia
75.635
708573
29796.0483
Bangladesh
Asia
64.062
150448339
1391.2538
Belgium
Europe
79.441
10392226
33692.6051
What kinds of variables are these?
The origins of “statistics”
The word statistics comes from the fact that it was information about the state. We’ll focus on information like this for now rather than thinking about samples of individuals.
Visualization basics
Consider two types of plots
univariate plots
bivariate plots
These are also types of distributions.
Univariate plots
# density plotggplot(d,aes(x = gdpPercap)) +geom_density()